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Second-Order Rate Region of 
Constant-Composition Codes for the 
Multiple-Access Channel 

Jonathan Scarlett, Alfonso Martinez and Albert Guillen i Fabregas 

Abstract 

This paper presents a new achievable second-order rate region for the discrete memoryless multiple-access channel. 
The result is obtained using the random-coding ensemble in which each user's codebook contains codewords of a fixed 
composition. The improvement of our second-order rate region over existing ones is demonstrated both analytically 
and numerically. 

□ 

I. Introduction 

Shannon's channel capacity describes the largest possible rate of transmission with vanishing error probability in 
coded communication systems. Further characterizations of the system performance are given by error exponents (T| 
Ch. 9], moderate deviations results [2], and second-order coding rates [3]. The latter has regained significant attention 
in recent years J4j, (3), and is well-understood for a variety of single-user channels. However, generalizations to 
multiuser settings have generally proved difficult even when the capacity region is known, primarily due to the lack 
of tight converse results. 

In this paper, we consider second-order coding rates for the discrete memoryless multiple-access channel (DM- 
MAC). This problem was previously studied in (6|-||9). In particular, achievable second-order rate regions have been 
obtained using i.i.d. random coding with a random time-sharing sequence (6j, j7] and a deterministic time-sharing 
sequence (SJ. 

One may expect constant-composition random coding [ 1 Ch. 9] to yield a larger second-order rate region than 
that of i.i.d. random coding, since the former yields higher random-coding error exponents for the MAC flO) . 
The main result of this paper shows that this is indeed the case. Specifically, we show that the second-order rate 
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region obtained from constant-composition coding can be strictly larger than those of (6)-||8) even after the full 
optimization of the parameters. A key tool in our analysis is a Berry-Esseen theorem associated with a variant of 
Hoeffding's combinatorial central limit theorem (CLT) pT[ ; see Section |TV-B| for details. 

A. Notation 

The set of all probability distributions on an alphabet A is denoted by V(A). Given a distribution Q(x) and a 
conditional distribution W(y|x), the joint distribution Q(x)W(y\x) is denoted by Q x W. We make use of the 



method of types 1 12 Ch. 2], The set of all sequences of length n with a given type Px is denoted by T n (Px), and 
similarly for joint types. Given a sequence x € T n (Px) and a conditional distribution Py\x, we define T£(Py\x) 
to be the set of sequences y such that (a;, y) £ T n (Px x iVlx)- 

Bold symbols are used for vectors and matrices. The vectors of zeros and ones are denoted by and 1 respectively, 
and the k x k identity matrix is denoted by Ikxk- The symbols -<, etc. denote element-wise inequalities for 
vectors, and inequalities on the positive semidefinite cone for matrices (e.g. M y means M is positive definite). 
We denote the matrix inverse by the positive definite matrix square root by (-)K and its inverse by (-)~^. 

The multivariate Gaussian distribution with mean /_t and covariance matrix S is denoted by iV(/u,S). 

The covariance matrix of a random vector Z is denoted by Cov[Z]. Overloading the notation, we denote the 
covariance of two scalar random variables Z\ and by Gov[Zi, Z%], and we write the variance as Var[Zi] = 
Cov[Z 1; Z{\. Logarithms have base e, and rates are in nats except in the examples, where bits are used. We denote 
the indicator function by !{•}. 

For two sequences f(n) and g(n), we write f(n) = 0(g(n)) if |/(n)| < c|g(n)| for some c and sufficiently 
large n, and f(n) = 0(3(71)) if lim„^oo f^y = 0. 

B. System Setup 

We consider a 2-user discrete memoryless MAC ^(yl^i, X2) with input alphabets X\ and X 2 and output alphabet 
y. The encoders and decoder operate as follows. Encoder v = 1,2 takes as input a message m„ uniformly 
distributed on the set {1, . . . , M u }, and transmits the corresponding codeword x ™ from the codebook C v = 
{x[}\ . . . , xi M "^}. Upon receiving y at the output of the channel, the decoder forms an estimate (mi, 777,2) = 
4>(Ci,C2,y), for some (possibly randomized) decoding function <j>. An error is said to have occurred if the estimate 
(7711,777,2) differs from (mi, 777,2)- 

A rate pair (iZj,^) is said to be (n, e)-achievable if there exist codebooks with Mi > exp(nRi) and M2 > 
exp(77i? 2 ) codewords of length n for users 1 and 2 respectively such that the average error probability does not 
exceed e. Given (i?i,i?2), we define the rate vector 

Ri 

n, ■ 'is 

i?l + i?2 
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The achievability part of the capacity result of Ahlswede [ 1 3 1 and Liao |14) states that for any e € (0,1), rates 
satisfying 

' I(X i; Y\X 2 ,U) 
I(X 2 ;Y\X U U) +9e(n)l (2) 
J(Xi,Jf 2 ;y|tO 

are (n, e)-achievable for some g e (n) which vanishes as n — > oo, under any joint distribution of the form ({/, Xi,X 2 , Y) 
Pu x Pxilc/ x Px 2 \u x Equation |2]l is said to describe the first-order rate region. Second-order rate regions 
are of the form Q with g e (n)l replaced by the sum of a second-order term and an asymptotic third-order term. 

We consider constant-composition random coding, as considered by Liu and Hughes fTO) , among others. We fix 
a time-sharing alphabet U, as well as the input distributions Qu(u), Qi(xi|u) and Q 2 (x 2 \u). We let Qu,n> Qi,n 
and Q 2:n denote the (conditional) types whose probabilities are --close to those of Qu, Qi and Q 2 respectively. 



For an arbitrary time-sharing sequence u with type Qu,m we generate the M L 



conditionally independently according to the uniform distribution on T£(Q vn ), i.e. 

1 



P 



x„\u 



{x u \u) 



\TS(Qu,n)\ 

Throughout the paper, we define the joint distribution 



i{^er;(ft,„)}. 



codewords of user v = 1,2 



(3) 



Pi 



A 



(u,xi,x 2 ,y) = Qu{u)Qi(x 1 \u)Q 2 {x 2 \u)W(y\x 1 ,x 2 ) 



(4) 



and denote the induced marginal distributions by Py\XiU> Py\x 2 U' etc - A key quantity in our analysis is the 
information density vector 

i 1 (u,x 1 ,x 2 ,y) 

i 2 (u,xi,x 2 ,y) , (5) 
ii 2 (u,X!,x 2 ,y) 



i(u,xi,x 2 ,y) 



A 



where 



■ / s A W(y\x!,x 2 ) 

ii(u,xi,x 2 ,y) = log 



u(y\x2,u) 



Y\X 2 

■ , s a W(y\x 1 ,x 2 ) 
i 2 {u,xi,x 2l y) = log— — j r 

PY\X 1 U(y\Xl,u) 

■ , \ a , W(y\xi,x 2 ) 
i 12 {u,x 1 ,x 2 ,y) = log— TT^- 



(6) 
(7) 
(8) 



It should be noted that averaging these quantities with respect to the distribution in Q yields the mutual information 
quantities appearing in 



C. Existing Results 

We make use of the multivariate generalization of the Q-function given by 

Qinv(V,e) = {zeK 3 : P[Z <z\>l-e}, 



(9) 
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where Z ~ N(0, V). Since the existing second-order rate regions (and the one given in this paper) are written in a 
similar form in terms of a matrix, a vector, and the function Qj nv , we define the following notion of achievability. 

Definition 1. Let I be a 3 x 1 vector, and let V be a 3 x 3 matrix. The pair (J, V) is said to be achievable if 
there exists a g(n) = o(y / n) such that, for all e g (0, 1) and (Ri,R 2 ) satisfying 

nR^nI-VnQ im (V,e)+g(n)l, (10) 

the pair (Ri,R 2 ) is (n, e) -achievable, where R is defined in Q. 

The first study of the problem under consideration was by Tan and Kosut [6|, who used i.i.d. random coding 
to prove that (J, V) with I = E[i(U, Xi, X 2 ,Y)] and V = Cav[i(U, X±, X 2 , Y)] is achievable for any choice of 
U and (Qu,Qi,Q2)- MolavianJazi and Laneman |7J showed that similar second-order estimates can be obtained 
by treating the three error events separately rather than jointly, and using just three variance terms instead of 
a full 3x3 covariance matrix. Huang and Moulin [8] showed that the covariance matrix can be improved to 
V = E[Cov[i(£7, Xi,X 2 , Y) \ U]] by fixing a constant-composition time-sharing sequence u, rather than generating 
one at random. 

For certain classes of channels, the present problem can be reduced to a single-user problem in order to obtain 
a matching converse to the above achievability results |9). However, very few results on a general converse have 
been proved, and the existing outer bounds are generally very loose and/or applicable only under the maximal error 
probability criterion (e.g. see (8] Sec. III]). 

A simple improvement on the achievability result of [8| can be obtained by letting one user's codebook be 
constant-composition and the other i.i.d., yielding a covariance matrix of the form V = E [Cov[i(C/, X\ , X 2 , Y) | U, Xi]\ 
or V = E[Cov[i(i7, X±, X 2 , Y) \ U,X 2 ]\. In the following section, we give a covariance matrix which improves 
further on each of these. 

II. Main Result 

The main result of this paper is as follows. 

Theorem 1. Fix any finite time-sharing alphabetic and the input distributions {QuiQiiQ2)- The pair (J, V) is 
achievable, where 



and for i>, v' — 1, 2, 12, 





h 






J = 


h 




(11) 




hi 








Vi,i 


Vx,2 Vl,12 




V = 


V 2 ,i 


^2,2 ^2,12 


(12) 






Y\2,2 Y\2,\2 




I, 


= E[t„ 


(U,X U X 2 ,Y)} 


(13) 



March 26, 2013 



DRAFT 



5 



V v , v > = E 



Cov i v {U,X 1 ,X 2 ,Y),i v ,(U,X 1 ,X 2 ,Y) U — Cov i v (U, X\,X 2 , Y),i v >(U, X\,X 2 , Y) U 



Cov i„(U,X u X 2 ,Y),i u ,(U,X u X 2 ,Y) U 



(14) 



[U,X l ,X 2 ,Y,X 1 ,X 2 ,Y,Y)~Q u { u )Q l (x l \u)Q 2 {x 2 \u)W{y\x 1 ,x 2 )Q^^ 



(15) 



Furthermore, the function g(n) in ( |10| > satisfies g(n) = O(logn). 

Proof: See Section [TV] 
For the diagonal elements of V, i.e. those with v = v' , ([14} simplifies to 

i^{U,X x )\U 



E 



Var 



i l/ (U,X 1 ,X 2 ,Y), U 



Var 



Var 



iW(U,X 2 )\U 



(16) 



where ^ 1} ([/,Vi) = E[v(£7, Xi, X 2 , Y) \ U,X X ] and d 2) (£/,Vi) = E[i„(U,X u X 2 ,Y)\U,X 2 ]. In this form, the 
diagonal terms have a pleasing interpretation. The term Var^] represents the variations in (Xi, X 2 , Y) in the i.i.d. 
case, and the terms Varj?^] and Var^l 2 '] represent the reduced variations in X\ and X 2 respectively, resulting 
from the codewords having a fixed composition. In particular, using constant-composition coding for user 1 and 
i.i.d. coding for user 2, we instead obtain variance terms of the form 



ycc-iid = E 



E 



Vax\i 1/ (U,X 1 ,X 2 ,Y) I u] - Var^C^i) I U 



Var i u {U,X 1 ,X 2 ,Y)\U,X 1 



(17) 



(18) 



thus recovering the result stated in Section I-C The quantity V u v is clearly less than or equal to V^ lld , which in 



turn is less than or equal to the analogous variance resulting from conditional i.i.d. random coding, given by |8| 



V, 



iid 



E 



Var 



i„(U,X u X 2 ,Y)\U 



It is interesting to compare ( fTo*) with the conditional variance 



y S omt = 
* V ,v 



= E 



Var 



Var 



i u (U,X u X 2 ,Y), \ U 



— Var 



i^{U,X x ,X 2 )\U 



i v {U,X t ,X 2 ,Y)\U,X u X 2 



(19) 



(20) 



(21) 



where i£, 12) (Z7, X u X 2 ) = E [i u (U, X u X 2 , Y) \ U,X U X 2 ]. Roughly speaking, this is the variance which we would 
obtain if the joint composition of (U, Xi,X 2 ) were fixed, which is impossible in general in the absence of 
cooperation. Based on this observation, we expect that Vj „ lnt < V v<v . This can be verified by using the law 
of total variance to write 



Var 



i^ 2 \u,X u X 2 ) 



Var 



E[i£ 2 \u, X u X 2 ) | X,]] +E[var^ 12 >(w, X u X 2 ) \ X x ] 



Vax^W^Xi)] +Ehax[i£ 2 \u,X 1 ,X 2 ) | 



(22) 
(23) 
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The latter quantity can be lower bounded as follows: 

E[Vax\iW{u,X 1 ,X 2 )\X 1 \^ = Qi{x 1 \u)Q 2 (x 2 \u)(ii 12 \u,x 1 ,x 2 ) - E^ 12 )( u , X 2 )] )' (24) 

X± ,X 2 

> ]T QaOcaM ( E ( Xl l") z - 12) *a) - E [*( 12 > (t», X 1 ,X 2 )] ) 2 (25) 

£2 x\ 

= E Q2 (x 2 |«) (*i 2) («, x 2 ) - E [i« (u, X 2 )] ) 2 (26) 

2:2 

= Vax[4 2 )( W ,X 2 )], (27) 

where ( |25j ) follows from Jensen's inequality, and ( |2"6") follows from the definitions of and i^ 2 \ Combining 
<T5J, @, g3) and ((27), we obtain V3™ nt < 

III. Example: The Collision Channel 
We consider the channel with X x = X 2 = {0,1,2}, y = {(0, 0), (0, 1), (0, 2), (1, 0), (2,0),c} and 

1 y — (x\, x 2 ) and min{xx, 2^2} = 
W{y\xx,xi) = { 1 y = c and min^x,^} + • (28) 

otherwise 

In words, if either user transmits the zero symbol then the pair (xi,x 2 ) is received noiselessly, whereas if both 
users transmit a non-zero symbol then the output is c, meaning "collision". 

We recall the following observations by Gallager p5| : (i) The capacity region can be obtained without time 
sharing^ (ii) By symmetry, the points on the boundary of the capacity region are achieved using U = and 
input distributions of the form Qi = (1 — 2pi,pi,pi) and Q 2 = (1 — 2p 2 ,p 2 ,P2)', (iii) The achievable rate region 
corresponding to any such (Qi, Q 2 ) pair is rectangular. To illustrate these observations, we plot the capacity region 
in Figure [T] along with three achievable rate regions corresponding to particular choices of pi and p 2 . 

We first compare the various random-coding schemes with fixed input distributions. Figure [2] plots the second- 
order regions with pi = p 2 — 0.2, n — 50 and e = 0.01, and with the third-order o(y / n) terms ignored. It should 
be noted that these ignored terms can be significant at finite block lengths, and thus the resulting curves should 
only be viewed as approximations. 

The improvement over |8| obtained by letting user l's codewords be constant-composition is insignificant at 
small values of R\ but significant at high values of R±, and similarly for user 2. The region resulting from 
Theorem [T] obtained using constant-composition codes for both users, is strictly larger than all of the others, and 
yields particularly large gains near the corner point. It is interesting to note that it is the only one which yields 
a rectangular second-order region for this particular choice of p\ and p 2 . This results from a rank-one dispersion 
matrix (e.g. see (6)). 



On the other hand, for the collision channel with K non-zero symbols, time-sharing is required for K > 8 



g 
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Ri (bits/use) 



Figure 1. Capacity region of the collision channel. 



1.4 



1.2 ' 



a o.f 



0.6 



i 



0.4 



0.2 ■ 



- Achievable rate region 

- Theorem 1 

- User 1 fixed composition 
User 2 fixed composition 

■ i.i.d. 



0.2 



0.4 



0.6 0.8 

i?i (bits/use) 



1.2 



1.4 



Figure 2. Second-order rate regions for the collision channel with pi = P2 = 0.2, n = 50 and e = 0.01. 



The preceding example shows that constant-composition codes can perform better than i.i.d. codes for a given 
choice of Qi and Q2- In the remainder of this section, we argue that the gains remain present even after the 
full optimization, as is the case for the random-coding error exponents of certain MACs (10). In contrast, in the 
single-user setting, constant-composition codes yield higher error exponents and second-order rates for a given input 
distribution, but no gain after the optimization of the input distribution [12, Ex. 10.33] |4|, |5|. 

For any given n, one can take the union of the achievable second-order regions in Theorem [T] (with the third- 
order term ignored) over all {Qu iQxtQi)- We denote the resulting region by 7?.*, and we say that (Qu, Qi, Q2) 
is first-order optimal (respectively, second-order optimal) if it achieves a point on the boundary of the capacity 
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region (respectively, the boundary of Tl* n ). As n grows large, the second-order term in fL0] > becomes insignificant 
compared to the first-order term, and we conclude that any sequence of second-order optimal input distributions 
must be asymptotically first-order optimal. Thus, we will obtain the desired result by showing that our variance 
terms Vi t i, V2.2 and Vn.vi are strictly smaller than the analogous quantities in |8| under all first-order optimal 
input distributions. It suffices to consider the case U = 0, since otherwise these quantities are simply weighted 
sums of the corresponding quantities under (Qi(-\u), Q2(-\u)), weighted by Qjj. In fact, as stated above, it suffices 
to consider distributions of the form Q\ = (1 — 2p\,px,px) and Q2 = (1 — 2p2,P2,P2)- 

Recall the variance terms for i.i.d. random coding given in ( p"9| ). We observe that V v v < with equality if 
and only if Var (.Xi)] = and Var [i„ (X 2 )\ = 0; the quantities and i„ are defined as in ( [To} after 
eliminating the time-sharing variable. By a direct calculation, it can be shown that 



»i2(ai) = (l-2pa)log 



1 11 

— + 2p 2 log — + log . r , 

i-2p 2 P2 Qiyxi) 



which yields zero variance if and only if pi = | (i.e. Qi = (|, |, |)). Similarly, i 12 (X2) has zero variance if and 
only if p 2 — | . However, from Figure [TJ we know that p\ = p 2 = | is not first-order optimal. A similar argument 
holds for zjj 1 ', i^ 2 \ i^ 1 ' and , except that the condition pi = = | is replaced by pi = P2 = 0.2867. Once 
again, we see from Figure[T|that this choice is not first-order optimal. Thus, for v 6 {1, 2, 12}, we have V VtU < V"^ 
for all first-order optimal input distributions. 



IV. Proof of TheoremQ] 

For clarity of exposition, we present the proof in the absence of time-sharing, and we assume that the input 
distributions Qi and Q2 and block length n are such that nQi(x\) and 71(52(^2) are integer-valued for all X\ and 



X2- We write i v (x±,X2, y) to denote the quantities in |6]l-([8| with the conditioning on u removed. In Section IV-C 
we state the changes in the proof required to handle the general case. 



Using the notation of Section I-B with the time-sharing sequence removed, we define the random variables 



(X 1 ,X 2 ,X 1 ,X 2 ,Y) ~ Px 1 (x 1 )Px 2 (x2)P Xl (x 1 )P X2 (x2)W n (y\x 1 ,X2) 



(29) 



A 



where W n (y\xx, x%) = II"=i W(yi\xi ! i,X2,i). We make use of the threshold-based bound on the random-coding 
error probability p e given by |T6] Thm. 3] 



< 



i n (X u X 2 ,Y)^ 7 
, M 2 -l 



Mi - 1, 



iZ(X lt X 2 ,Y) > 7 i 

(Afi-1)(M 2 -1). 



q(X 1 ,X 2 ,Y)> l2 



ih(X 1 ,X 2 ,Y)>>y 13 



where 7 = [71 72 712] 7 " is arbitrary, and 



i n (xi,x 2 ,y) = y^ j i(xi, i ,x 2 ,i,yi). 

i=l 

n 

i"(x 1 ,x 2 ,y) = *y]i v (xi t i ) X2,i,yi), 



(30) 

(31) 
(32) 
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where i and i v are defined in ||3]h(S). The quantities i" should not be confused with the information densities 
obtained by replacing the distributions in |6]l-(|8]l with their multivariate counterparts (e.g. Px(x)). Under constant- 
composition random coding, the latter quantities differ from i", and thus ([30} differs from the usual dependence- 
testing (DT) bound given in AT] Thm. 2]. 

We claim that the second, third and fourth terms of ( |30] ) can be upper bounded by M^po(n)e~~ /v for v = 1, 2, 12 
respectively, where po(n) is a polynomial depending only on the alphabet sizes, and 

M 12 =M X M 2 . (33) 

We prove this for v = 12 only, since the other two are handled similarly. We write Q™{x v ) = Y\a=i Qu( x v,i) f° r 
v = 1,2, and make use of the fact that 

= —Ql{Xy)t{Xy G T"(Q„)}, (34) 

where = P[X^ G T n (Q v )] with ~ Q™(x v ). Using standard properties of types, we have fi v>n > 

(n + i)-(l^l-i) (jj] pp. 17]. We therefore obtain 

MiAf 2 p[<?(Xi J X a> y) > 712] = M t M 2 J2 Pxdxx)Px 2 (x 2 )W n (y\x 1 ,x 2 )l{i'l 2 (x 1 ,x 2 ,y) > 7l2 } 

x 1 ,x 2 ,y 

(35) 

< MlM2 V Q5 t (x 1 )Q^x 2 )W"(y|x 1 , ! r 2 )l{i? 2 (x 1 ,x 2 ,y)>7 12 } 

(36) 

£ Q r i(^i)Q2(5? 2 )fn^(y l )V'" 712 ( 37 > 

< MxM 2 p {n)e-^\ (38) 

where ( |36| ) follows from |34} and by summing over all sequences instead of just those in T n (Q u ), ( |37| i follows 
by using the definition of i v and upper bounding the indicator function, and we have defined Po(n) = (n + 

1 ^(|Afi|+|Ar 2 |-2) i 

Returning to ( |30l ), we have thus far shown that 

p e <¥[i n {X 1 ,X 2 ,Y) ^7] +p (n)5^M v e-T r ", (39) 
where the summation is over v = 1, 2, 12. Using this bound with 

lu = log M v + ( d + I) log n, (40) 



2 

where d = \Xi\ + \X 2 \ — 2 is the order of the polynomial po(n), the statement of the theorem will follow using 
identical steps to |6 Thm. 2] once we prove the following: 

1) For all v, v> G {1, 2, 12}, E[%(Xi, X 2 , Y)} = nl v + O(^) and Cov\i*{X x ,X 2 , Y),i^(X lt X 2 , Y)} = 
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2) The probability on the right-hand side of ( |39] l can be approximated using a multivariate Berry-Esseen theorem. 
We derive the required moments in the following subsection, and then present the required Berry-Esseen theorem 



in Section IV-B The remaining details of the proof are omitted to avoid repetition. 



A. Calculation of Moments 

Let Xi } i denote the i-th entry of X\, and similarly for X 2 ,% and Yi. The first moment of i™ is easily found by 
writing 

n 

E[i2{X u X 2 ,Y)] =^E[v(X 1)i ,X 2 , i ,Y i )] =nl v , (41) 
»=1 

where the last equality follows since, by symmetry, X\^ ~ Q\ and X 2 ^ ~ Q 2 for all i. The derivation of the 
covariance term for is similar for all (v, v') pairs, so we focus on v = v' = 12. We wish to compute the 

quantity 



Var 



/^ii2(Xi t i, Xi^ Yj) — Cov ii2{Xi > i,X2 ! i,Yi),ii2(Xi i j,X2 l j,Yj 

»=i J i=l j=l 



nVar [i 12 (Xi , X 2 , Y )] - (n 2 - n)Cov [i 12 {X 1 , X 2 , F) , i 12 (X[ , X' 2> Y') 



(42) 



(43) 



where (Xi, X 2 ,Y) and (X[,X' 2 , Y') correspond to two arbitrary but different indices in {1, • ■ • ,n} (e.g. one can 
set (Xx,X 2 ,Y) = (Xi A ,X 2 ,i,Y 1 ) and (X[,X 2 ,Y') = {X 1 . 2 ,X 2 , 2l Y 2 )). In g3), we have used the fact that, by 



the symmetry of the codebook construction, the n terms in ( |42] i with i = j are equal, and similarly for the n 2 — n 
terms with i ^ j. 

To compute the covariance term in ( |4"3"j ), we need the joint distribution of (Xi,X 2 ,y) and (X[, X' 2 ,Y'). This 
distribution is easily understood by considering the following procedure for generating a codeword uniformly over 
T n (Q): (i) Fix an arbitrary sequence x = (xi,-- - ,x n ) with composition Q; (ii) Randomly choose a symbol 
from the n symbols of x (each with probability —) and place it in position 1 of the codeword, (iii) From the 
n — 1 remaining symbols of x, randomly choose one (each with probability ^ry) and place it in position 2 of the 
codeword; (iv) Continue until all n symbols have been placed. Stated more compactly, this procedure generates a 
codeword uniformly over T n {Q) by randomly permuting the symbols of an arbitrary sequence x € T n (Q). 

From the above procedure, we conclude that 



F[X' U = x' v \X v = 



nQ v {x' v ) - t{x v = x' v } 



n - 



1 



(44) 
(45) 
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for v = 1,2. Let Q' v (x' v \x v ) denote the right-hand side of ( p3] >. The covariance in ( |43j ) is given by 

Cov [iu (X x , X 2 , Y), tta (X( , X 2 , Y') 
= E [(i 12 (X 1 , X 2 , Y) - J 12 ) (i 12 (X[ , X 2 , Y') - I 12 ) 
= Qi(#i)Q2(:z 2 )W(y|a:i,2: 2 ) 



(46) 



11, 2:2,3/ 
X 



X! Q'i( x i\ x i)Q'2( x 2\ x 2)W(y'\x' 1 ,x' 2 )(i 12 (x 1 ,x2,y) - /i 2 )(ii 2 (a;i,a; 2 ,y') - ^12) (47) 

= T 1 +T 2 +r 3 +T 4 , (48) 

where the four terms in ( ftB") correspond to the four terms in the expansion of mQi(a;' 1 )— ll{a;i = x^}) (nQ 2 (x' 2 ) — 
l{x 2 — x 2 }) resulting from ( |4"5j ). Specifically, we obtain 



Ti 
T 2 



(n - l) 2 
n 



E 



(ii 2 (Xi,J: 2 ,Y)-Ji 2 ) 



/:■. - 

T 4 



(iu (X! , X 2 , Y) - 7 12 ) (i 12 (X 1 ,X 2 ,Y)-I 1: 
(iu (X x , X 2 , Y) - 7 12 ) (i ia (Xi , X 2 , F) - J u 
i 12 (Xi, X 2 , Y) - J 12 ) (i 12 (X!, X 2 , Y) - 7 12 ) 



(49) 
(50) 
(51) 
(52) 



(n - l) 2 
(n~l) 2 ' 
(n- l)^ 

under the random variables defined in ( |15) , along with (Y'|Xi,X 2 ) <~ W(y'\xi,x 2 ), conditionally independent of 
the other variables appearing in ( [15) . We observe that T\ = and T4 = 0(n~ 2 ), and hence 

Cov 



H 2 (X 1 ,X 2 ,Y),i 12 (Xi,X 2l Y') 
T 2 +T 3 + 0' ' 



--Cov 
n 



i 12 (X 1 ,X 2 ,Y),j 12 (X 1 ,X 2 ,Y) 



- -Cov 
n 



'12 



(X l! X 2 ,Y),i 12 (X 1 ,X 2 ,Y)] 



where we have used the identity („ "^2 — „ + 0(^2). Substituting ( |55) l into ( |43j ), we obtain 



Var 



ii2(Xi > j, X 2i j, Y) 



= nVia.12 + 0(1), 



(53) 
(54) 
(55) 

(56) 



where Vi 2) i 2 is defined as in ( fT4| ) with U 



B. A Combinatorial Berry-Esseen Theorem 

Before stating the required Berry-Esseen theorem, we outline some of the relevant literature. A combinatorial 
CLT was proved by Hoeffding [11], who proved the asymptotic normality of random variables of the form 
IZILi f n {h 7r (*))' where f n is a real-valued function taking arguments on 1, • • • , n, and n(-) is uniformly distributed 
on the set of permutations of {1, ••• ,n}. The rate of convergence (i.e. Berry-Esseen theorem) was studied by 
Bolthausen |l7| , who proved 0(^=) convergence under fairly general conditions. An extension to the multivariate 



setting was given by Bolthausen and Gi7,cetze 1 18 1 
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A more general setting is that in which each f n (i,j) is replaced by a random variable Z n (i,j), independent of 
7T, such that Z n (i,j) is independent of Z n (i',j') whenever 7^ (i',f). The analysis of each scalar quantity 
X2,Y) (see ( f2"9"| ) and ( |32| >) falls into this setting upon identifying 

Z n (hj) = iv{xi,i > X2,j,Y n (i,j)), (57) 

where X\ — (xi t %, ■ ■ ■ ,xi tn ) and X2 — (2:2,1, • • • ,X2, n ) are arbitrary sequences of type Q\ and Q2 respectively, 



and Y n (i,j) ~ W(-|a;i ! j, a^j)- Berry-Esseen theorems for this setting were proved by von Bahr |19| and Ho and 
Chen (20). 

In our case, a multivariate generalization to random vectors Z n (i,j) in M 3 is required. The desired Berry-Esseen 
theorem is a special case of a more general result by Loh pT[ Thm. 2] for a problem known as Latin hypercube 
sampling. We do not present the problem statement here, but instead refer the reader to pT| pp. 2059] for a 



discussion of the special case yielding the combinatorial CLT with a given distribution for Z n (i,j), and to j 21 pp. 



2058] for a discussion on handling more general distributions (see also |22 pp. 543]). Using these observations, 
the following theorem can be inferred. 



Theorem 2. (Corollary of 1 21 Thm. 2]) Fix d e Z + , and for each n £ Z + , let Z n (i,j) (i = 1, • • ■ , n, j = 1, • • ■ , n) 
be a collection of independent random vectors in R d . Let S n = Y^—i Z n (i,Tr(i)), where 7r(-) uniformly distributed 
on the set of all permutations of {I. • • • , n}, independently of each Z n (i, j). Denote the mean vector and covariance 
matrix of S n by /j, n and S„ respectively. If S„ >- and max^^ E[||Z„(i, j)|| 3 ] is uniformly bounded in n, then 

~ A -i 

S n = £„ 2 (S n - fi n ) satisfies 

F[S n £A]-P[ZeA]=0(-j=) (58) 
for any convex, Borel measurable set A C M. d , where Z ~ 7V(0, 13x3). 

The desired Berry-Esseen theorem for the present setting follows by identifying 

Z n (i,j) = i(xi :i ,x 2 ,j,Y n (i, j)), (59) 

where X\ — (0:1,1, • • • ,xi <n ) and x 2 — (x2,i, ■ ■ ■ ,X2, n ) are arbitrary sequences of type Q\ and Q2 respectively, 
and Y n (i, j) <~ PF(-|a;i ) i, X2,j). The third moment assumption is always satisfied in the discrete memoryless setting, 
due to the uniform bounds in |6 Appendix D]. 

C. General Case 

In the case that Qi and Q 2 do not correspond to types of length n, we can simply repeat the above derivation 



using Qi tTl and Q2. n , defined in Section I-B A less trivial generalization is that in which U 7^ 0, and thus the 



codewords are drawn uniformly over the conditional type class T U (Q U ) for some u 6 T n (Qu). In this case, 



the procedure described in Section |IV-A| for generating a codeword uniformly over the type class replaced by 
the following procedure. Let x be an arbitrary element of the conditional type class T u (-). Instead of randomly 
permuting the entire sequence x, a random permutation of the subsequence x^ u ' corresponding to the indices where 
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u equals u is applied independently for each value of u € U. Due to this independence, the covariance in ( |42| ) 
is zero between symbols with different corresponding u values. Within each subsequence, the joint distribution 
between two symbols is similar to that of |44]i-(|45|, with Qi(-\u) replacing Q\ and nQu(u) replacing n. The 
quantity i n in pi) is replaced by 

n 

A 



i n (u, xi,x 2 ,y) = ^ i(ui, xi,i,x 2 ,i, Vi) (60) 

nQ[f(u) 

= E E i(u^l^M u) ), (61) 



where is the i-th entry of X\ for which the corresponding u entry equals u. From Theorem^ we conclude 
that under (u,Xi,X 2 ,Y), each inner summation in doTb is asymptotically normal with 0( , — ^^ =) = 0(-^=) 
convergence. It follows that the overall sum is also asymptotically normal with 0(^=) convergence. 

Using the above observations and repeating the analysis of this section, we obtain the more general result of 
Theorem Q] 
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